What is bandit arm?

Bandit arm refers to the phenomenon in which the performance of a particular arm or option in a multi-armed bandit problem consistently outperforms the other options. In multi-armed bandit problems, a decision-maker must choose between two or more options repeatedly, with the goal of maximizing an objective function such as expected rewards or expected regret. The challenge arises from a lack of complete information about the effectiveness of each option, as the decision-maker must balance exploration (trying new options to gather information) and exploitation (choosing the option that has been performing best so far). When one option consistently outperforms the others, it becomes the "bandit arm," and a decision-maker may choose to exploit it more heavily than the others. However, this approach may lead to suboptimal outcomes if the bandit arm's performance changes over time or if the decision-maker fails to explore new options that may eventually outperform the bandit arm.